L13: Geometric Operations with Vectors
Centroids, Distances, Combining and Dissolving, Buffers
Bogdan G. Popescu
John Cabot University
Intro
The main goals for today include:
Learn more about spatial funnction in the sf package
perform geometric calculations: areas, change units
calculating centroids and distances
making buffers and dissolving polygons
intersecting polygons and calculating line density by polygons
Geometric Calculations
Geometric Calculations can be divided into three groups based on their output:
Numeric : calculating areas, length, or polygon areas
Logical : whether feature A intersects feature B
Spatial : creating polygon centroids, buffers, or intersection area
Numeric Geometric Calculations
There are several functions pertaining to the sf package that are helpful to calculate numeric geometric properties:
Numeric Geometric Calculations
Download or copy the shape files for Spain: gadm41_ESP_shp
Put them in a data folder under the relevant week.
Read the level-2 shapefiles:
#Step0: Loading libraries
library (sf)
library (ggplot2)
sf_use_s2 (FALSE )
#Step1: Reading Spain borders at level 0 and 2
esp0 <- st_read (dsn= "./data/gadm41_ESP_shp/gadm41_ESP_0.shp" , quiet = TRUE )
esp2 <- st_read (dsn= "./data/gadm41_ESP_shp/gadm41_ESP_2.shp" , quiet = TRUE )
#Step2: Simplifying the shapefile for easier plotting
esp0<- st_simplify (esp0, dTolerance= 0.01 )
esp2<- st_simplify (esp2, dTolerance= 0.01 )
#Step3: Selecting only one variable
esp2b<- subset (esp2, select = c ("NAME_2" ))
Numeric Geometric Calculations: Area
We can use st_area to calculate the area of the provinces in Spain
#Calculating the area
esp2b$ area<- st_area (esp2b)
If we examine the first six entries, we see the following output:
Units: [m^2]
[1] 8797713021 7423953234 13774060764 12641812444 10103977438 13471380582
Meters or Square Meters are the default units for area and distance calculations.
We can convert the units objects to other type of units such as Square Kilometers
library (units)
esp2b$ area_sqkm<- set_units (esp2b$ area, 'km^2' )
esp2b$ area_sqkm[1 : 6 ]
Units: [km^2]
[1] 8797.713 7423.953 13774.061 12641.812 10103.977 13471.381
Numeric Geometric Calculations: Area
And here how they compare to the original calculation:
Transformation to Square Km:
Units: [km^2]
[1] 8797.713 7423.953 13774.061 12641.812 10103.977 13471.381
Original area:
Units: [m^2]
[1] 8797713021 7423953234 13774060764 12641812444 10103977438 13471380582
Notice however that the two columns are not numeric. They are units.
You can see that using str:
Units: [km^2] num [1:52] 8798 7424 13774 12642 10104 ...
Numeric Geometric Calculations: Area
We can make this a numeric variable in the following way:
esp2b$ area_sqkm_numeric<- as.numeric (esp2b$ area_sqkm)
esp2b$ area_sqkm_numeric[1 : 6 ]
[1] 8797.713 7423.953 13774.061 12641.812 10103.977 13471.381
We can check using the following:
str (esp2b$ area_sqkm_numeric)
num [1:52] 8798 7424 13774 12642 10104 ...
Numeric Geometric Calculations: Area
We can visualize the district area in the following way:
#Step1: Turning multipolygons into polygons
esp0b = st_cast (esp0,"POLYGON" )
#Step2: Calculating area
esp0b$ area<- st_area (esp0b)
#Step3: Selecting the largest polygon
esp0b<- subset (esp0b, area== max (area))
#Step4: Creating a box around the polygon
bbox <- st_bbox (esp0b)
#Step5: Extracting min and max latitudes and longitudes
min_lat_y <- bbox["ymin" ]
max_lat_y <- bbox["ymax" ]
min_lon_x <- bbox["xmin" ]
max_lon_x <- bbox["xmax" ]
error<- 0.05
ggplot (esp2b, aes (fill= area_sqkm_numeric)) +
geom_sf () +
geom_sf_text (aes (label = NAME_2), colour = "white" , size= 2 )+
theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))+
scale_fill_viridis_c (option = "viridis" )
Numeric Geometric Calculations: Area
We can visualize the district area in the following way:
Numeric Geometric Calculations: Distance
We can calculate the distance between two features using st_distance.
Numeric Geometric Calculations: Distance
Let’s say that we want to calculate the distance between Avila and Valencia
Numeric Geometric Calculations: Distance
To do this, we can simply do:
Step 1: Calculating centroids
# Calculating polygon centroids
esp2_ctr<- st_centroid (esp2b)
This is how polygons translate into centroids
Show the code
ggplot ()+
geom_sf (data= esp2b, color= "black" , fill= NA )+
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))
Show the code
ggplot ()+
geom_sf (data= esp2_ctr, color= "black" , fill= NA )+
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))
Numeric Geometric Calculations: Distance
Step 2: Selecting the place of interest
place_int<- subset (esp2_ctr, NAME_2== "Valencia" | NAME_2== "Ávila" )
## Numeric Geometric Calculations: Distance
dist <- st_distance (place_int)
The result is matrix of units values
Units: [m]
[,1] [,2]
[1,] 0.0 378268.3
[2,] 378268.3 0.0
Rows refers to origins and column refers to destination
So the distance between Avila and Avila is 0 meters.
The distance between Avila and Valencia is 377692.8 meters
The distance between Valencia and Avila is 377692.8 meters
The distance between Valencia and Valencia is 0 meters
Numeric Geometric Calculations: Distance
The matrix dimensions are
c (nrow (place_int), nrow (place_int))
Numeric Geometric Calculations: Distance
We can of course calculate distances among multiple points
#Step1: Selecting the regions of interest
place_int2<- subset (esp2_ctr, NAME_2== "Valencia" | NAME_2== "Ávila" | NAME_2== "Madrid" )
#Step2: Mapping centroids onto polygons
ggplot () +
geom_sf (data= esp2b) +
geom_sf (data= esp2_ctr, color= "black" ) +
geom_sf (data= place_int2, color= "red" , size= 3 ) +
geom_sf_text (data= esp2_ctr, aes (label = NAME_2), colour = "black" , size= 2 , hjust= 0.5 , vjust= 1.5 )+
theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))
Numeric Geometric Calculations: Distance
We can of course calculate distances among multiple points
Numeric Geometric Calculations: Distance
We can now calculate the distance
# Numeric Geometric Calculations: Distance
dist <- st_distance (place_int2)
The result is matrix of units values
Units: [m]
[,1] [,2] [,3]
[1,] 0.0 104415.2 378268.3
[2,] 104415.2 0.0 278746.6
[3,] 378268.3 278746.6 0.0
Numeric Geometric Calculations: Distance
The result is matrix of units values
Units: [m]
[,1] [,2] [,3]
[1,] 0.0 104415.2 378268.3
[2,] 104415.2 0.0 278746.6
[3,] 378268.3 278746.6 0.0
Rows refers to origins and columns refers to destination
So the distance between Avila and Avila is 0 meters.
The distance between Avila and Madrid is 104228.0 meters
The distance between Avila and Valencia is 377692.8 meters
The distance between Madrid and Avila is 104228.0 meters
The distance between Madrid and Avila is 0 meters
The distance between Madrid and Valencia is 278401.9 meters
And so on…
Numeric Geometric Calculations: Distance
Just like before, we can set the units to km
library (units)
dist2<- set_units (dist, 'km' )
dist2
Units: [km]
[,1] [,2] [,3]
[1,] 0.0000 104.4152 378.2683
[2,] 104.4152 0.0000 278.7466
[3,] 378.2683 278.7466 0.0000
Numeric Geometric Calculations
Thus, we have covered examples of numeric geometric calculations
st_box - to calculate the bounding box coordinates
st_area - to calculate the area of a feature
st_length - to calculate length of lines or perimeters
We now move to logical geometric calculations
Logical geometric calculations
There are are a variety geometric calculations including:
st_contains_properly
st_contains
st_covered_by
st_covers
st_crosses
st_disjoint
st_equals_exact
st_equals
st_intersects
st_is_within_distance
st_overlaps
st_touches
st_within
To see more details on what they do, check out the Package ‘sf’ Manual !
Spatial geometric calculations
The sf package provides common geometry operation functions
st_centroid
st_buffer
st_convex_hull
st_voronoi
st_sample
Spatial geometric calculations
Spatial geometric calculations
Spatial geometric calculations
Spatial geometric calculations
Spatial geometric calculations
Spatial geometric calculations: Centroids
A centroid is the geometric center of the geometry, represented by one point.
For example, we have already calculated polygon centroids
#Step1: Calculating centroids
esp2_ctr<- st_centroid (esp2)
#Step2: Mapping centroids onto polygons
ggplot () +
geom_sf (data= esp2) +
geom_sf (data= esp2_ctr, color= "black" ) +
geom_sf_text (data= esp2, aes (label = NAME_2), colour = "black" , size= 2 , hjust= 0.5 , vjust= 1.5 )+
theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))
Spatial geometric calculations: Centroids
A centroid is the geometric center of the geometry, represented by one point.
For example, we have already calculated polygon centroids
Spatial geometric calculations: Combining and Dissolving
The st_combine and st_union functions can be used to combine multiple geometries into a single geometry
st_combine combines geometries into one. st_union dissolves internal borders
We can easily see the difference below:
library ("gridExtra" )
esp1 <- st_read (dsn= "./data/gadm41_ESP_shp/gadm41_ESP_2.shp" , quiet = TRUE )
p1<- ggplot ()+
geom_sf (data= st_combine (esp1))+ theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))+
ggtitle ("st_combine" )
p2<- ggplot ()+
geom_sf (data= st_union (esp1))+ theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))+
ggtitle ("st_union" )
grid.arrange (p1, p2, ncol= 2 )
Spatial geometric calculations: Combining and Dissolving
The st_combine and st_union functions can be used to combine multiple geometries into a single geometry
st_combine combines geometries into one. st_union dissolves internal borders
We can easily see the difference below:
Spatial geometric calculations
We can also see their attribute table:
Original geometry:
Simple feature collection with 3 features and 13 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -6.473472 ymin: 35.93736 xmax: -1.630006 ymax: 38.72911
Geodetic CRS: WGS 84
GID_2 GID_0 COUNTRY GID_1 NAME_1 NL_NAME_1 NAME_2 VARNAME_2
1 ESP.1.1_1 ESP Spain ESP.1_1 Andalucía NA Almería NA
2 ESP.1.2_1 ESP Spain ESP.1_1 Andalucía NA Cádiz NA
3 ESP.1.3_1 ESP Spain ESP.1_1 Andalucía NA Córdoba NA
NL_NAME_2 TYPE_2 ENGTYPE_2 CC_2 HASC_2 geometry
1 NA Provincia Province 04 ES.AN.AM MULTIPOLYGON (((-3.030694 3...
2 NA Provincia Province 11 ES.AN.CD MULTIPOLYGON (((-5.841249 3...
3 NA Provincia Province 14 ES.AN.CO MULTIPOLYGON (((-4.248001 3...
Spatial geometric calculations
st_combine geometry
head (st_combine (esp1), n= 3 )
Geometry set for 1 feature
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -18.16153 ymin: 27.63736 xmax: 4.328195 ymax: 43.79153
Geodetic CRS: WGS 84
st_union geometry
head (st_union (esp1), n= 3 )
Geometry set for 1 feature
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -18.16153 ymin: 27.63736 xmax: 4.328195 ymax: 43.79153
Geodetic CRS: WGS 84
Spatial geometric calculations
Calculating the distance between Ávila and Valencia
# Subsetting the cities
avila <- subset (esp1, NAME_2 == 'Ávila' )
valencia <- subset (esp1, NAME_2 == 'Valencia' )
# Union
avila <- st_union (avila)
valencia <- st_union (valencia)
# Calculating centroids
avila_ctr = st_centroid (avila)
valencia_ctr = st_centroid (valencia)
# Calculating distance
d = st_distance (avila_ctr, valencia_ctr)
d
Units: [m]
[,1]
[1,] 378281.9
# Converting to km
set_units (d, 'km' )
Units: [km]
[,1]
[1,] 378.2819
Spatial geometric calculations
Mapping the centers
ggplot ()+
geom_sf (data= esp2)+
geom_sf (data= avila_ctr, color = "red" , size= 3 )+
geom_sf (data= valencia_ctr, color = "red" , size= 3 )+
theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))
Spatial geometric calculations
Mapping the centers
Spatial geometric calculations
We can use st_combine to transform the points into a single MULTIPOINT geometry.
st_combine is similar to st_union but only combines and does not dissolve
p <- c (avila_ctr, valencia_ctr)
p <- st_combine (p)
p
Geometry set for 1 feature
Geometry type: MULTIPOINT
Dimension: XY
Bounding box: xmin: -4.945356 ymin: 39.37033 xmax: -0.8009094 ymax: 40.57105
Geodetic CRS: WGS 84
Geometry casting
We can use st_cast to draw a line between the two points
l = st_cast (p, 'LINESTRING' )
l
Geometry set for 1 feature
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -4.945356 ymin: 39.37033 xmax: -0.8009094 ymax: 40.57105
Geodetic CRS: WGS 84
We can also visualize it:
ggplot ()+
geom_sf (data= esp2)+
geom_sf (data= avila_ctr, color = "red" , size= 3 )+
geom_sf (data= valencia_ctr, color = "red" , size= 3 )+
geom_sf (data= l, color = "red" )+
theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))
Geometry casting
We can use st_cast to draw a line between the two points
l = st_cast (p, 'LINESTRING' )
l
Geometry set for 1 feature
Geometry type: LINESTRING
Dimension: XY
Bounding box: xmin: -4.945356 ymin: 39.37033 xmax: -0.8009094 ymax: 40.57105
Geodetic CRS: WGS 84
We can also visualize it:
Buffers
Another type of operation is buffer
st_buffer calculates buffers - polygons encompassing all points up to the specified distance from a given geometry.
We can for example calculate 250km buffers around Avila and Valencia.
To be precise about our distances, we will use EPSG codes
# Transform your data to a suitable projected CRS that uses meters as units
# For example, you can use a UTM zone that covers your area of interest
# Here, I'm assuming your data is around Avila, Spain, so I'm using UTM zone 30N (EPSG:2062)
# See https://epsg.io/ and look for Spain
# You should choose an appropriate UTM zone for your data. Note that otherwise, this would be in degrees
avila_ctr_sp <- st_transform (avila_ctr, crs = 2062 )
valencia_ctr_sp <- st_transform (valencia_ctr, crs = 2062 )
#Creating the buffer at 250km. Note that unit is in meters: st_crs(avila_ctr_sp) - LENGTHUNIT under Datum
avila_250 = st_buffer (avila_ctr_sp, dist = 250000 )
valencia_250 = st_buffer (valencia_ctr_sp, dist = 250000 )
#Going back to the original crs in degrees
avila_250 <- st_transform (avila_250, crs = st_crs (esp2))
valencia_250 <- st_transform (valencia_250, crs = st_crs (esp2))
Buffers
ggplot ()+
geom_sf (data= esp2)+
geom_sf (data= avila_ctr, color = "red" , size= 3 )+
geom_sf (data= valencia_ctr, color = "red" , size= 3 )+
geom_sf (data= avila_250, color = "red" , fill= NA )+
geom_sf (data= valencia_250, color = "red" , fill= NA )+
#geom_sf(data=l, color = "red")+
theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))
Buffers
Another type of operation is buffer
st_buffer calculates buffers - polygons encompassing all points up to the specified distance from a given geometry.
We can for example calculate 250km buffers around Avila and Valencia:
Geometry Difference from Pairs
There are four geometry-generating functions that operate on pairs of input geometries:
st_intersection
st_difference
st_sym_difference
st_union
Geometry Difference from Pairs
Geometry Difference from Pairs
Geometry Difference from Pairs
Geometry Difference from Pairs
Geometry Difference from Pairs
Geometry Difference from Pairs: union
We might be interested in the total area that is within 250km from both Anvila and Valencia
We can do this in the following way:
union_area<- st_union (avila_250, valencia_250)
ggplot ()+
geom_sf (data= esp2)+
geom_sf (data= avila_ctr, color = "red" , size= 3 )+
geom_sf (data= valencia_ctr, color = "red" , size= 3 )+
geom_sf (data= union_area, color = "red" , fill= NA )+
theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))
Geometry Difference from Pairs: union
We might be interested in the total area that is within 250km from both Anvila and Valencia
We can do this in the following way:
Geometry Difference from Pairs: union
Remember that st_union can also be applied to an individual layer,
This returns a dissolved union of all geometries:
Vector layer aggregation
Sometimes, we may not want to dissolve all features into a single geometry
Sometimes, we may want to dissolve by an attribute
Vector layer aggregation
Let us assume that we did not have the polygons for provinces in Spain.
And we had instead polygons for sub-provinces.
Vector layer aggregation
We can visualize all the provices by color.
ggplot ()+
geom_sf (data= esp3, aes (fill= NAME_2))+
theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))
Vector layer aggregation
We can visualize all the provices by color.
Vector layer aggregation
We can thus aggregate by NAME_2 - the variable that defines the name of the province
library (dplyr)
esp3 <- st_read (dsn= "./data/gadm41_ESP_shp/gadm41_ESP_3.shp" , quiet = TRUE )
esp3c<- esp3 %>%
group_by (NAME_2) %>%
summarize ()
And then map it out:
ggplot ()+
geom_sf (data= esp3c)+
theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))
Vector layer aggregation
We can thus aggregate by NAME_2 - the variable that defines the name of the province
esp3 <- st_read (dsn= "./data/gadm41_ESP_shp/gadm41_ESP_3.shp" , quiet = TRUE )
esp3c<- esp3 %>%
group_by (NAME_2) %>%
summarize ()
And then map it out:
Vector layer aggregation
We can also aggregate by NAME_1
esp3 <- st_read (dsn= "./data/gadm41_ESP_shp/gadm41_ESP_3.shp" , quiet = TRUE )
esp3c<- esp3 %>%
group_by (NAME_1) %>%
summarize ()
And then map it out:
ggplot ()+
geom_sf (data= esp3c)+
theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))
Vector layer aggregation
We can thus aggregate by NAME_1
esp3c<- esp3 %>%
group_by (NAME_1) %>%
summarize ()
And then map it out:
Vector layer aggregation
A common operation is to calculate density of lines within polygons
Download railways from Spain
For example, we would be interested in calculating density of railways within provinces
#Step1: Reading the polylines
rails <- st_read (dsn= "./data/europe-railways-shape/railways.shp" , quiet = TRUE )
#Step2: Calculating the length of rails in m
lengths <- st_length (rails)
#Step3: Uniting the lines
merged_line <- st_union (rails)
#Step4: Summing up the lengths of the original lines
total_length <- sum (lengths)
#Step5: Creating a new sf object with the merged line and total length
result_sf <- st_sf (geometry = merged_line, total_length = total_length)
ggplot ()+
geom_sf (data= esp2)+
geom_sf (data= result_sf, color = "red" )+
theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))
Vector layer aggregation
A common operation is to calculate density of lines within polygons
For example, we would be interested in calculating density of railways within provinces
Vector layer aggregation
#Step1: Performing the intersection between rails and polygons
intersection <- st_intersection (rails, esp2)
#Step2: Calculating the length of the intersecting polylines
intersection$ length <- st_length (intersection)
#Step3: Saving only relevant vars
intersection2<- subset (intersection, select = c ("NAME_2" , "length" ))
#Step4: Dropping the geometry
intersection2<- st_drop_geometry (intersection2)
#Step5: Dividing the length by 1000 to obtain km
intersection2$ length<- as.numeric (intersection2$ length)/ 1000
#Step6: Performing the left join (left = polygons; right = lines)
esp2<- left_join (esp2b, intersection2, by = c ("NAME_2" = "NAME_2" ))
#Mapping
ggplot (esp2, aes (fill= length)) +
geom_sf () +
geom_sf_text (aes (label = NAME_2), colour = "white" , size= 2 )+
theme_void () +
coord_sf (xlim = c (min_lon_x - error, max_lon_x + error),
ylim = c (min_lat_y - error, max_lat_y + error))+
scale_fill_viridis_c (option = "viridis" )
Vector layer aggregation
Conclusion
We have covered a veriety of sf functions:
perform geometric calculations: areas, changing units
calculating distances
calculating centroids
making buffers and dissolving polygons
casting lines
intersecting polygons
aggregating polygons by attributes
calculating line density by polygona